Statistical Computing with R (Stat 226)
Brad Hartlaub
Fall 2022
MSSC Tutor - Andrew Nguyen
- Andrew will be available on Sunday, Tuesday, and Thursday evening from 8:00 until 9:00 PM to help you with the material for this course.
R links
Daily Agendas
- August 26
- Generating Pi from 1000 numbers
- August 29
- August 31
- September 2
- September 5
- September 7 - Lab day for activities and HW
- September 9
- September 12
- September 14
- September 16 - Problem Session
- September 19
- September 21
- September 23 - Problem Session
- September 26
- September 28
- September 30 - Problem Session and lab time for project 1
- October 3 - Project Presentations - Activity 11 (Peer Review of presentations - respond to email messages with your comments and suggestions for improvements in the future before 5:00 pm today)
- Section 2 - 10:10
- Ngoc-Ha Vu and Tam Nguyen - YouTube API - Are you a YouTube Enthusiast?
- Joe Simon - Relievers in MLB
- Owen Breen and Chris Johnson - Data Wrangling and the Human Capital Index
- Section 1 - 11:10
- October 5 - Project Presentations - Activity 12 (Peer Review of presentations - respond to email messages with your comments and suggestions for improvements in the future before 5:00 pm today)
- Section 2 - 10:10
- Grant Culbertson - Spotify API - Exploring with SpotifyR
- Harshal Rukhaiyar and Vaughn Hajra - The Tennis "Big Three"
- Alex Thoms and Sebastian Brylka - Best Men's Tennis Players (outside of the big three)
- Section 1 - 11:10
- Alex Felleson and Junaid Fahim - School funding across state districts
- Alison Buckley and Tori Simon - The Intersection of National and Global Abortion Rates
- October 10 - Project Presentations - Activity 13 (Peer Review of presentations - respond to email messages with your comments and suggestions for improvements in the future before 5:00 pm today)
- Section 2 - 10:10
- Tori Keller - Passing, rushing, and receiving data from the NFL
- Nafi Rahman - Analyzing the top 200 players in Valorant
- Casey Watkins - Comparing carbon dioxide emissions across states and countries
- Section 1 - 11:10
- Sheetal Tallada and Yangyang Liu - 2002-2020 Alcohol Binge Drinking in 18+ Adults in the US
- Ever Croffoot-Suede - Formula 1 points and records
- Cem Tener and Luis Weekes - A Look at Drug Use and Overdose Data
- October 12 - Project Presentations - Activity 14 (Peer Review of presentations - respond to email messages with your comments and suggestions for improvements in the future before 5:00 pm today)
- Section 2 - 10:10
- Tian Zhang and Yiheng Li - Console game sales and data analysis report
- Nathan Le and Khue Tran - Beer vs. Wine: A look at worldwide supply and consumption data
- Juan Sergio Matabuena - Evolution of Basketball: the 3-point shot
- Section 1 - 11:10
- Claire Fomook and Elliot Moore - California Forest Fires
- Nick Lewis - Evolution of Basketball: the 3-point shot
- Ben Czech and Nick Hong - Mortgage rates and other marcroeconomic variables
- October 14 - Project Presentations - Activity 15 (Peer Review of presentations - respond to email messages with your comments and suggestions for improvements in the future before 5:00 pm today)
- Section 2 - 10:10
- Hung Nguyen - Energy and economic activities
- Quang Nguyen - Meat Production
- Section 1 - 11:10
- Mariah Szabo - Energy and economic activities
- Jake Ritz - Powerlifting Analysis
- October 17
- October 21 - Problem Session
- October 24 - Additional information on resampling is available in BootstrappingIntro.mp4, BootstapSampling.mp4, and BootstrapCIs.mp4.
- October 26
- October 28 - Problem Session
- October 31 - Project Presentations - Activity 16 (Peer Review of presentations - respond to email messages with your comments and suggestions for improvements in the future before 5:00 pm today)
- Section 2 - 10:10
- Chris (and Ever) - Vacation in Alaska: A lesson in logistics
- Casey - Simulations and bootstrappoing for Halloween applications
- Section 1 - 11:10
- Ever (and Chris) - Weather simulations for Alaska
- November 2 - Project Presentations - Activity 17 (Peer Review of presentations - respond to email messages with your comments and suggestions for improvements in the future before 5:00 pm today)
- Section 2 - 10:10
- Tam and Nathan - Concerns of an animal activist: A simulation and bootstrap project of Austin Animal Center
- Quang and Vaughn - Simulation and resampling for Fantasy Football
- Harshal and Juan - Probabilities to consider before buying an electric vehicle
- Section 1 - 11:10
- Yangyang and Cem - Columbus, OH Crumble Cookies Inspired Car Accident Simulation
- Tori and Claire - Exploring life expectancy in different countries
- Drew and Jake - Oreo taste testing
- November 4 - Project Presentations - Activity 18 (Peer Review of presentations - respond to email messages with your comments and suggestions for improvements in the future before 5:00 pm today)
- Section 2 - 10:10
- Tori Keller - Premier League Goals
- Khue and Ha - Pirates Worldwide: Bootstrap and Simulation
- Section 1 - 11:10
- Luis and Junaid - Simulating answers for car insurers
- Nick Hong and Nick Lewis - Hotel demand
- Sheetal and Mariah - Flu vaccines in the United States
- November 7 - Project Presentations - Activity 19 (Peer Review of presentations - respond to email messages with your comments and suggestions for improvements in the future before 5:00 pm today)
- Section 2 - 10:10
- Sebastian - Predicting soccer results
- Grant Culbertson - My tenure at UPS: An analyzation of workplace data
- Nafi and Owen - Horse racing and betting
- Section 1 - 11:10
- Alex - Driving simulations and waitlisting
- Ben - Simulations for police traffice stops in Rhode Island
- November 9 - Project Presentations - Activity 20 (Peer Review of presentations - respond to email messages with your comments and suggestions for improvements in the future before 5:00 pm today)
- Section 2 - 10:10
- Hung - Fishing in Lake Erie
- Tian - House simulation
- Yiheng - Gacha game simulation
- Section 1 - 11:10
- Joe - Avalanche simulations
- Alex - Bootstrapping and simulation for chick weights
- Alison and Elliot - Simulations for earthquakes
- November 11 - Monte Carlo simulation studies - Please read the article Use of R as a Toolbox for Mathematical Statistics Exploration in the file horton-tas.pdf in our !Class Material folder on Google Drive.
- November 14
- November 16
- Section 2 - 10:10
- Vaughn Hajra - ANOVA and Kruskal-Wallis: A Monte Carlo power study
- Section 1 - 11:10
- Luis Weekes - ANOVA and Kruskal-Wallis: A Monte Carlo power study
- November 18
- Section 2 - 10:10
- Tori Keller - Happiness
- Sebastian Brylka and Joe Simon - Decision trees and random forests
- Harshal Rukhaiyar and Nafi Rahman - Text as data: #BlackFriday
- Section 1 - 11:10
- Alison Buckley and Tori Simon - Geospatial data
- Elliot Moore - A measure of association for nominal categorical variables
- November 28
- Section 2 - 10:10
- Tam Nguyen and Ngoc Ha Vu - Natural language processing: sentiment analysis with R
- Casey Watkins - How does AI divide America? A lesson in unsupervised learning
- Khue Tran and Nathan Le - Monte Carlo power comparisons for the one sample t-test and one sample Wilcoxon signed-rank test
- Section 1 - 11:10
- Ever Croffoot-Suede - Ordinal regression and preventing overfitting
- November 30
- Section 2 - 10:10
- Hung Nguyen - Pharmacometric models with R
- Quang Nguyen - Monte Carlo power study of two-way ANOVA block design and Friedman's test
- Grant Culbertson - Shiny apps: making R accessible
- Section 1 - 11:10
- Yangyang Liu - Monte Carlo power study of two sample t tests and nonparametric competitors
- Ben Czech and Jake Ritz - Text as data (Chapter 19)
- December 2
- Section 2 - 10:10
- Alex Thoms - Markov chains and infectious disease rates
- Chris Johnson - Monte Carlo power study for correlation coefficients
- Section 1 - 11:10
- Cem Tener and Drew Grier - Self organizing maps: clustering based on hitter performance
- Nick Hong - Time series with dynamic and customized data graphics
- Sheetal Tallada - Comparing Tukey HSD and Kruskal multiple comparisons procedures
- December 5
- Section 2 - 10:10
- Juan Sergio Matabuena and Nick Lewis - Text as data: Spotify API
- Section 1 - 11:10
- Alex Felleson - Markov chains
- Mariah Szabo and Owen Breen - Geospatial data
- Claire Fomook - Text as data
- December 7 - Individual consultations
- December 9 - Individual consultations (Overall standing)
- Final Projects
- Each student will conduct a detailed simulation to solve a probability or statistical problem of interest. Ideally, this simulation will be related to a research problem of interest to you. The case studies and student projects in our textbook and other resources serve as great examples for reasonable projects. Summaries of your proposed simulation must be submitted on or before Monday, November 28. Final papers should be 12 to 15 pages in length and explain the problem of interest, your analysis, and your conclusions. Your paper and supporting R code must be submitted to your Google Drive folder on or before the final exam time assigned by the Registrar. The deadline for the 10:10 section is December 15 at 1:30 pm and the deadline for the 11:10 is December 16 at 1:30.
Homework Assignments
- Your solutions must be submitted electronically to your Google Drive folder. You may use any software that you want, but please submit a PDF file with your written solutions. For example, the name of the file for the first homework assignment should be HW1-yourname.PDF.
- Activity #1 - due on Wednesday, August 31
- Activity #2 - due on Friday, September 2
- Activity #3 - due on Monday, September 5
- HW #1 - due on Friday, September 9
- HW #2 - due on Friday, September 16
- HW #3 - due on Friday, September 23
- HW #4 - due on Friday, September 30
- Small Group Project #1 (Data Wrangling) - presentations will begin on Monday, October 3 - PPT or PDF is due on the day of your presentation
- Create an R script or markdown file that applies the data wrangling methods from Part I of our course (Chapters 1-8) to interesting data and practical problems of interest to you. You may work by yourself or with a partner on this project. Ideally, your solution will have several different approaches in R. During your presentation, you should introduce at least one new function to the class (the function could be a user defined function, but it is not required to be a function that you write) and include at least one appropriate visual summary. In short, I want you to be creative with these projects that will be helpful for your peers in seeing the wide variety of applications for the data wrangling skills we have learned so far in the course. Your presentation to the class should be 10 to 12 minutes.
- Upload activities 4 through 10 (PDF only) to your Google Drive HW folder before class on Monday, October 17
- HW #5 - due on Friday, October 21
- HW #6 - due on Friday, October 28
- Small Group Project #2 (Bootstrapping and simulation) - presentations will begin on Monday, October 31- PPT or PDF is due on the day of your presentation
- Create an R script or markdown file that applies boostrapping and simulation for at least three probability distributions to pratical problems of interest to you. You may work by yourself or with a partner on this project. If you work with a partner, it must be a different partner than you worked with on the data wrangling project. The goal of the boostrapping component is for you to simulate and apply the sampling distribution of a statistic where you do not know the theoretical distribution. That is, do not apply bootstrapping methods for a sample mean or the difference in two sample means. The goal of the simulation component is for you to learn how to simulate probabilities using R. If you want to check the accuracy of your simulation with a theoretical solution that is fine, but the focus of these probability applications should be on simulation. You must include at least one continuous distribution and at least one discrete distribution during your presentation. Presentations to the class should last approximately 12 minutes.
- Small Group Project #3 - presentations will begin on Wednesday, November 16 - your files (PPT or PDF and your R code) are due on the day of your presentation
- Option 1 - A Monte Carlo power study of at least two competing test procedures
- Option 2 - Introduce new material from Modern Data Science with R or R for Data Science or some other statistical computing text or article.
- You may work by yourself or with a partner on this project. Your presentation should be approximately 13 minutes, and the question and answer session after each presentation will be no longer than 5 minutes.
- Reminder - your final project proposals are due on Monday, November 28
- HW #7- due on Friday, December 2
Interesting Links